---
title: Tantivy Aggregates
---

<Info>
  Tantivy aggregates are a ParadeDB enterprise feature. [Contact
  us](mailto:sales@paradedb.com) for access.
</Info>

<Note>
  All fields referenced by a Tantivy aggregate JSON string must be indexed as
  [fast fields](/documentation/indexing/fast_fields).
</Note>

In addition to plain SQL aggregates, ParadeDB also has the ability to compute aggregates over a single BM25 index by accepting JSON query strings.

These aggregates can be more performant than plain SQL aggregates over some datasets.

## Syntax

`paradedb.aggregate` accepts three arguments: the name of the BM25 index, a full text search query builder function,
and a Tantivy aggregate JSON.

```sql
SELECT * FROM paradedb.aggregate(
    '<index_name>',
    <search_query>,
    '<aggregate_query>'
);
```

<ParamField body="index_name" required>
  The name of the BM25 index as a string.
</ParamField>
<ParamField body="search_query" required>
  A full text search query builder function. The aggregate will be computed over
  the results of this function.
</ParamField>
<ParamField body="aggregate_query" required>
  A Tantivy aggregate JSON string. See the sections below for how to construct
  these JSONs.
</ParamField>

## Count

A count aggregation tallies the number of values for the specified field across all documents.

```sql
SELECT * FROM paradedb.aggregate(
    'search_idx',
    paradedb.all(),
    '{
        "rating_total": {
            "sum": {"field": "rating"}
        }
    }'
);
```

<ParamField body="field" required>
  The field name to compute the count on.
</ParamField>
<ParamField body="missing">
  The value to use for documents missing the field. By default, missing values
  are ignored.
</ParamField>

## Average

An average aggregation calculates the mean of the specified numeric field values across all documents.

```sql
SELECT * FROM paradedb.aggregate(
    'search_idx',
    paradedb.all(),
    '{
        "avg_rating": {
            "avg": {"field": "rating"}
        }
    }'
);
```

<ParamField body="field" required>
  The field name to compute the average on.
</ParamField>
<ParamField body="missing">
  The value to use for documents missing the field. By default, missing values
  are ignored.
</ParamField>

## Sum

A sum aggregation computes the total sum of the specified numeric field values across all documents.

```sql
SELECT * FROM paradedb.aggregate(
    'search_idx',
    paradedb.all(),
    '{
        "rating_total": {
            "sum": {"field": "rating"}
        }
    }'
);
```

<ParamField body="field" required>
  The field name to compute the sum on.
</ParamField>
<ParamField body="missing">
  The value to use for documents missing the field. By default, missing values
  are ignored.
</ParamField>

## Min

A min aggregation finds the smallest value for the specified numeric field across all documents.

```sql
SELECT * FROM paradedb.aggregate(
    'search_idx',
    paradedb.all(),
    '{
        "min_rating": {
            "min": {"field": "rating"}
        }
    }'
);
```

<ParamField body="field" required>
  The field name to compute the minimum on.
</ParamField>
<ParamField body="missing">
  The value to use for documents missing the field. By default, missing values
  are ignored.
</ParamField>

## Max

A max aggregation finds the largest value for the specified numeric field across all documents.

```sql
SELECT * FROM paradedb.aggregate(
    'search_idx',
    paradedb.all(),
    '{
        "max_rating": {
            "max": {"field": "rating"}
        }
    }'
);
```

<ParamField body="field" required>
  The field name to compute the maximum on.
</ParamField>
<ParamField body="missing">
  The value to use for documents missing the field. By default, missing values
  are ignored.
</ParamField>

## Stats

A stats aggregation provides a collection of statistical metrics for the specified numeric field, including count, sum, average, min, and max.

```sql
SELECT * FROM paradedb.aggregate(
    'search_idx',
    paradedb.all(),
    '{
        "rating_stats": {
            "stats": {"field": "rating"}
        }
    }'
);
```

<ParamField body="field" required>
  The field name to compute the stats on.
</ParamField>
<ParamField body="missing">
  The value to use for documents missing the field. By default, missing values
  are ignored.
</ParamField>

## Percentiles

The percentiles aggregation calculates the values below which given percentages of the data fall, providing insights into the distribution of a dataset.

```sql
SELECT * FROM paradedb.aggregate(
    'search_idx',
    paradedb.all(),
    '{
        "rating_percentiles": {
            "percentiles": {"field": "rating"}
        }
    }'
);
```

<ParamField body="field" required>
  The field name to compute the percentiles on.
</ParamField>
<ParamField body="percents" default={[1.0, 5.0, 25.0, 50.0, 75.0, 95.0, 99.0]}>
  The percentiles to compute.
</ParamField>
<ParamField body="keyed" default={false}>
  Whether to return the percentiles as a hash map.
</ParamField>
<ParamField body="missing">
  The value to use for documents missing the field. By default, missing values
  are ignored.
</ParamField>

## Cardinality

A cardinality aggregation estimates the number of unique values in the specified field using the HyperLogLog++ algorithm.
This is useful for understanding the uniqueness of values in a large dataset.

<Note>
  The cardinality aggregation provides an approximate count, which is accurate
  within a small error range. This trade-off allows for efficient computation
  even on very large datasets.
</Note>

```sql
SELECT * FROM paradedb.aggregate(
    'search_idx',
    paradedb.all(),
    '{
        "unique_users": {
            "cardinality": {"field": "user_id", "missing": "unknown"}
        }
    }'
);
```

<ParamField body="field" required>
  The field name to compute the cardinality on.
</ParamField>
<ParamField body="missing">
  The value to use for documents missing the field. By default, missing values
  are ignored.
</ParamField>

## Histogram

Histogram is a bucket aggregation where buckets are created dynamically based on a specified interval. Each document value is rounded down to its bucket. For example, if you have a price of 18 and an interval of 5, the document will fall into the bucket with the key 15. The formula used for this is: `((val - offset) / interval).floor() * interval + offset`.

```sql
SELECT * FROM paradedb.aggregate(
    'search_idx',
    paradedb.all(),
    '{
        "rating_histogram": {
            "histogram": {"field": "rating", "interval": 1}
        }
    }'
);
```

<ParamField body="field" required>
  The field to aggregate on.
</ParamField>
<ParamField body="interval" required>
  The interval to chunk your data range. Each bucket spans a value range of
  [0..interval). Must be a positive value.
</ParamField>
<ParamField body="offset" default={0.0}>
  Shift the grid of buckets by the specified offset.
</ParamField>
<ParamField body="min_doc_count" default={0}>
  The minimum number of documents in a bucket to be returned.
</ParamField>
<ParamField body="hard_bounds">
  Limits the data range to [min, max] closed interval.
</ParamField>
<ParamField body="extended_bounds">
  Extends the value range of the buckets.
</ParamField>
<ParamField body="keyed" default={false}>
  Whether to return the buckets as a hash map.
</ParamField>
<ParamField body="is_normalized_to_ns" default={false}>
  Whether the values are normalized to ns for date time values.
</ParamField>

## Date Histogram

Similar to histogram, but can only be used with datetime types. Currently, only fixed time intervals are supported.

```sql
SELECT * FROM paradedb.aggregate(
    'search_idx',
    paradedb.all(),
    '{
        "created_at_histogram": {
            "date_histogram": {"field": "created_at", "fixed_interval": "1h"}
        }
    }'
);
```

<ParamField body="field" required>
  The field to aggregate on.
</ParamField>
<ParamField body="fixed_interval" required>
  The interval to chunk your data range. Each bucket spans a value range of
  [0..fixed_interval). Accepted values should end in `ms`, `s`, `m`, `h`, or
  `d`.
</ParamField>
<ParamField body="offset" default={0}>
  Shift the grid of buckets by the specified offset.
</ParamField>
<ParamField body="min_doc_count" default={0}>
  The minimum number of documents in a bucket to be returned.
</ParamField>
<ParamField body="hard_bounds">
  Limits the data range to [min, max] closed interval.
</ParamField>
<ParamField body="extended_bounds">
  Extends the value range of the buckets.
</ParamField>
<ParamField body="keyed" default={false}>
  Whether to return the buckets as a hash map.
</ParamField>

## Range

Range allows you to define custom buckets for specific ranges.

```sql
SELECT * FROM paradedb.aggregate(
    'search_idx',
    paradedb.all(),
    '{
        "ranges": {
            "range": {"field": "rating", "ranges": [
                { "to": 3.0 },
                { "from": 3.0, "to": 7.0 },
                { "from": 7.0, "to": 20.0 },
                { "from": 20.0 }
            ]}
        }
    }'
);
```

<ParamField body="field" required>
  The field to aggregate on.
</ParamField>
<ParamField body="ranges" required>
  A list of ranges to aggregate on.
</ParamField>
<ParamField body="keyed" default={false}>
  Whether to return the buckets as a hash map.
</ParamField>

## Terms

Terms creates a bucket for every unique term and counts the number of occurrences.

```sql
SELECT * FROM paradedb.aggregate(
    'search_idx',
    paradedb.all(),
    '{
        "rating_terms": {
            "terms": {"field": "rating"}
        }
    }'
);
```

<ParamField body="field" required>
  The field to aggregate on.
</ParamField>
<ParamField body="size" default={10}>
  The number of terms to return.
</ParamField>
<ParamField body="segment_size" default={100}>
  The number of terms to fetch from each segment.
</ParamField>
<ParamField body="show_term_doc_count_error" default={false}>
  Whether to include the document count error.
</ParamField>
<ParamField body="min_doc_count" default={1}>
  The minimum number of documents in a term to be returned.
</ParamField>
<ParamField body="order">The order in which to return the terms.</ParamField>
<ParamField body="missing">
  The value to use for documents missing the field.
</ParamField>

## Nested Aggregations

Buckets can contain sub-aggregations. For example, creating buckets with the range aggregation and then calculating the average on each bucket:

```sql
SELECT * FROM paradedb.aggregate(
    'search_idx',
    paradedb.all(),
    '{
        "range_rating": {
            "range": {
            "field": "rating",
            "ranges": [
                { "from": 1, "to": 3 },
                { "from": 3, "to": 5 }
            ]
            },
            "aggs": {
            "average_in_range": { "avg": { "field": "rating"} }
            }
        }
    }'
);
```
